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Ensifer (syn. Sinorhizobium) meliloti is an important symbiotic bacterial species that fixes ni- 
trogen. Strains B021CC and AK58 were previously investigated for their substrate utilization 
and their plant-growth promoting abilities showing interesting features. Here, we describe the 
complete genome sequence and annotation of these strains. B021CC and AK58 genomes are 
6,985,065 and 6,974,333 bp long with 6,746 and 6,992 genes predicted, respectively. 



Introduction 



Strains AK58 and B021CC belong to the species 
Ensifer (syn. Sinorhizobium) meliloti 
[Alphaproteobacteria, Rhizobiales, Rhizobiaceae, 
Sinorhizobium / Ensifer group) [1,2], an important 
symbiotic nitrogen fixing bacterial species that 
associates with roots of leguminous plants of sev- 
eral genera, mainly from Melilotus, Medicago and 
Trigonella [3]. These strains have been originally 
isolated from Medicago spp. during a long course 
experiment (B021CC) and from plants collected in 
the north Aral sea region (Kazakhstan] (AK58). 
Previous analyses conducted by comparative ge- 
nomic hybridization (CGH), nodulation tests and 
Phenotype Microarray™(Biolog Inc.) showed that 
AK58 (= DSM 23808) and B021CC (= DSM 23809) 
are highly diverse in both genomic and phenotypic 
properties. In particular, they show different sym- 



biotic phenotypes with respect to the crop legume 
Medicago sativa L [4,5]. In a previous collabora- 
tion with DOE-JGI, the genomes of strains AK83 (= 
DSM 23913) and BL225C (= DSM 23914) were 
also sequenced, allowing the identification of pu- 
tative genetic determinants for their different 
symbiotic phenotypes [6]. Consequently, interest 
in strains AK58 and B021CC arose, sincegenomic 
analysis of these strains would foster a greater 
understanding of the E. meliloti pangenome [7], 
and facilitate deeper investigation of the genomic 
determinants responsible for differences in sym- 
biotic performances between E. meliloti strains 
found in nature. These research goals may lead to 
improved strain selection and better inoculants of 
the legume crop M. sativa. 
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Classification and features 

Representative genomic 16S rRNA sequences of 
strains AK58 and B021CC were compared with 
those present in the Ribosomal Database by using 
Match Sequence module of Ribosomal Database 
Project [8]. Representative genomic 16S rRNA se- 
quences of closer phylogenetic relatives of the ge- 
nus Ensifer /Sinorhizobium and of Rhizobiales family 
(as outgroup) were then selected from IMG-ER da- 
tabase [Table 1], [16]. All strains from the genus 
Ensifer /Sinorhizobium form a close cluster, includ- 
ing strains AK58 and B021CC, thus confirming the 
affiliation of these two strains within the species. 
Figure 1 shows the phylogenetic neighborhood of 
E. meliloti AK58 and B021CC in a 16S rRNA based 
tree. 

E. meliloti AK58 and B021CC show different sym- 
biotic phenotypes with respect to the host plant 
Medicago sativa, as well as differences in sub- 
strates utilization [5]. Moreover E. meliloti AK58 
and B021CC present differences in cell morpholo- 
gy also, with AK58 being smaller than B021CC and 
the other E. meliloti strains for which genome se- 
quencing is available (Figure 2). Interestingly, 
B021CC is also showing cells with a ratio between 
cell axes nearer 1 (more rounded cells), when 
compared with AK58 and with the other E. meliloti 
strains (Figure 2). 

Genome sequencing information 

Genome project history 

AK58 and B021CC strains were selected for se- 
quencing on the basis of the Community Sequenc- 
ing Program 2010 of DOE Joint Genome Institute 
(JGI) in relation to the project entitled "Complete 
genome sequencing of Sinorhizobium meliloti 
AK58 and B021CC strains: Improving alfalfa per- 
formances through the exploitation of 
Sinorhizobium genomic data". The overall ra- 
tionale for their genome sequencing was related 
to the identification of genomic determinants of 
different symbiotic performances between S. 
meliloti strains. The genome project is deposited 
in the Genomes On Line Database [21] and the 
complete genome sequence is deposited in 
GenBank. Sequencing, finishing and annotation 
were performed by the DOE-JGI. A summary of the 
project information is shown in Table 2. 

Growth conditions and DNA isolation 

E. meliloti strains AK58 and B021CC (DSM23808 
and DSM23809, respectively) were grown in DSMZ 



medium 98 (Rhizobium medium) [22] at 28°C. DNA 
was isolated from 0.5-1 g of cell paste using Jetflex 
Genomic DNA Purification kit (GENOMED 600100) 
following the standard protocol as recommended 
by the manufacturer with modification st/LALMP 
[23] for strain AK58 and additional 5 |il proteinase 
K incubation at 58° for 1 hour for strain B021CC, 
respectively. DNA will be available on request 
through the DNA Bank Network [24]. 

Genome sequencing and assembly 

The draft genomes were generated at the DOE 
Joint Genome Institute (JGI) using Illumina data 
[25]. For B021CC genome, we constructed and 
sequenced an Illumina short-insert paired-end 
library with an average insert size of 270 bp 
which generated 76,033,356 reads and an 
Illumina long-insert paired-end library with an 
average insert size of 9,141.74 ± 1,934.63 bp 
which generated 4,563,348 reads totaling 6,463 
Mbp of Illumina data. For AK58, a combination of 
Illumina [25] and 454 technologies [26] was used. 
For the AK58 genome we constructed and se- 
quenced an Illumina GAii shotgun library which 
generated 80,296,956 reads totaling 6,102.6 Mb, a 
454 Titanium standard library which generated 0 
reads and 1 paired end 454 library with an aver- 
age insert size of 10 kb which generated 326,569 
reads totaling 96 Mb of 454 data. All general as- 
pects of library construction and sequencing per- 
formed at the JGI can be found at [27]. The initial 
draft assemblies contained 194 contigs in 16 scaf- 
fold(s) for B021CC, and 311 contigs in 5 scaffolds 
forAK58. 

For B021CC the initial draft data was assembled 
with Allpaths and the consensus was computation- 
ally shredded into 10 Kbp overlapping fake reads 
(shreds). The Illumina draft data was also assem- 
bled with Velvet, version 1.1.05 [28], and the con- 
sensus sequences were computationally shredded 
into 1.5 Kbp overlapping fake reads (shreds). The 
Illumina draft data was assembled again with Vel- 
vet using the shreds from the first Velvet assembly 
to guide the next assembly. The consensus from the 
second Velvet assembly was shredded into 1.5 Kbp 
overlapping fake reads. The fake reads from the 
Allpaths assembly and both Velvet assemblies and 
a subset of the Illumina CLIP paired-end reads were 
assembled using parallel phrap, version 4.24 (High 
Performance Software, LLC). Possible mis- 
assemblies were corrected with manual editing in 
Consed [29-31]. 
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Table 1. Classification and general features of E. meliloti AK58 and B021CC according to the MICS rec- 
ommendations [9] and the Names for Life database [10] 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [11] 






Phylum Proteobacteria 


TAS [12] 






Class Alphaproteobacteria 


TAS [12] 




Current classification 


1 Uc 1 /\ / IIZ. \JUlCX ICj 

I eti 1 1 1 1 y i\ I iiz. yJU /oil trcttr 
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OUtX.lt;D Llljlld 1 1 tdllvJLl 

Strain RD? 1PP 

J ( let I I I L) v.^Z. 1 v_. V_. 

Strain AK58 


TAS M 91 

TAS [171 

TAS \1 171 
I r\o [Z t I Z J 

1 AJ 1 1 J] 

TAS [4 51 
TAS [4,5] 




rnm era i n 

V_J I LI 1 1 1 I? Id 1 1 1 


npa^tivp 

I icti ci Li v 


TAS [12] 
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TAS n 9 1 
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1 viuLi ii i_y 


\/i oti Ifi 
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TAS [1 91 
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1 1 1 II IC^ Z U J/ V_, 


TAS n 91 
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TAS [1 71 




S;} 1 r ni tv 
j> ci 1 1 1 ii ly 


Tnlpratp 1 fi% NlaTI 


TAS F1?l 


MIGS-22 


Oxygen requirement 


Aerobe 


TAS [12] 




("arhon ^onrrp 

(J 1 1 7 V 7 1 1 .9 \ J 1 \ A . 


era rbohvrlrpitpc, 3 nH ^pilft of nrpa nic pic i He, 


TAS [12] 




Energy metabolism 


c hemoorga notrop h 


TAS [12] 


MIGS-6 


Habitat 


Soil, root nodules of legumes 


TAS [3,12] 


MIGS-15 


Biotic relationship 


free living, symbiont 


TAS [12] 


MIGS-14 


Pathogenicity 


not reported 






Biosafety level 


1 


TAS [14] 


MIGS-23.1 


Isolation 


B02 1 CC: root nodules of Medicago sativa cv. 'Oneida' 

A l/r n i II C K M 1' r 1 i 

AK58: root nodules of Medicago falcata 


TAS [4] 


MIGS-4 


Geographic location 


B021CC: Lodi, Italy 
AK58: Kazakhstan, 


TAS [4] 


MIGS-5 


Sample collection time 


B021CC: 1997 
AK58:2001 


NAS 


MIGS-4. 1 


Latitude 


B021CC: 45.31 
AK58: 58.75 


NAS 


MIGS-4.2 


Longitude 


B021CC: 9.50 
AK58: 48.98 


NAS 


MIGS-4.3 


Depth 


not reported 




MIGS-4.4 


Altitude 


B021CC: 70 m 
AK58: 305 m 


NAS 



Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non- 
traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a gen- 
erally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene 
Ontology project [15]. 
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Gap closure was accomplished using repeat reso- 
lution software (Wei Gu, unpublished), and se- 
quencing of bridging PCR fragments with Sanger 
and/or PacBio (unpublished, Cliff Han) technolo- 
gies. For improved high quality draft and noncon- 
tiguous finished projects, one round of manu- 
al/wet lab finishing may have been completed. 
Primer walks, shatter libraries, and/or subse- 
quent PCR reads may also be included for a fin- 
ished project. A total of 128 additional sequencing 
reactions and 126 PCR PacBio consensus se- 
quences were completed to close gaps and to raise 
the quality of the final sequence. The total ("esti- 
mated size" for unfinished) size of the B021CC 
genome is 7.1 Mb and the final assembly is based 
on 6,463 Mbp of Illumina draft data, which pro- 
vides an average 910 x coverage of the genome. 

For AK58, the 454 Titanium standard data and the 
454 paired end data were assembled together with 
Newbler, version 2.6 (20110517_1502). The 
Newbler consensus sequences were computation- 
ally shredded into 2 kb overlapping fake reads 
(shreds). Illumina sequencing data was assembled 
with Velvet, version 1.1.05 [28], and the consensus 



sequence was computationally shredded into 1.5 
kb overlapping fake reads (shreds). We integrated 
the 454 Newbler consensus shreds, the Illumina 
Velvet consensus shreds and the read pairs in the 
454 paired end library using parallel phrap, version 
SPS - 4.24 (High Performance Software, LLC). The 
software Consed [29-31] was used in the following 
finishing process. Illumina data was used to correct 
potential base errors and increase consensus quali- 
ty using the software Polisher developed at JGI 
(Alia Lapidus, unpublished). Possible mis- 
assemblies were corrected using gapResolution 
(Cliff Han, unpublished), Dupfinisher [32], or se- 
quencing cloned bridging PCR fragments with 
subcloning. Gaps between contigs were closed by 
editing in Consed, by PCR and by Bubble PCR (J-F 
Cheng unpublished) primer walks. A total of 0 ad- 
ditional reactions were necessary to close gaps and 
to raise the quality of the finished sequence. The 
estimated genome size of AK58 is 7 Mb and the fi- 
nal assembly is based on 61.5 Mb of 454 draft data 
which provides an average 8.8 * coverage of the 
genome and 420 Mb of Illumina draft data which 
provides an average 60 * coverage of the genome. 

Sinorhizobium meliloti BL225C 

Sinorhizobium meliloti SM 1 1 



Sinorhizobium meliloti 1 021 

Sinorhizobium meliloti CIAM1 775 

Ensifer meliloti B021CC 

— Ensifer meliloti AK58 

S3 ' Sinorhizobium meliloti AK83 

Sinorhizobium medicae WSM41 9 

' Sinorhizobium arboris LMG14919 

52 

Sinorhizobium fredii HH103 

59_ Sinorhizobium fredii N G R2 34 

' Sinorhizobium terangae WSM1721 

— I Mesorhizobium loti USDA 347 1 

Rhizobium ef//-CFN42 

94 ' Rhizobium leguminosarum bv. viciae 3841 

Azorhizobium caulinodans ORS 571 

Bradyrhizobium japonicum USDA1 10 

Figure 1. Phylogenetic consensus tree showing the position of E. meliloti AK58 and B021CC strains in 
the Ensifer/Sinorhizobium genus. The phylogenetic tree was inferred by using the Maximum Likelihood 
method based on the Tamura 3-parameter model [1 7], chosen as model with the lowest BIC scores 
(Bayesian Information Criterion) after running a Maximum Likelihood fits of 24 different nucleotide sub- 
stitution models (Model Test). The bootstrap consensus tree inferred from 500 replicates [18] is taken to 
represent the phylogenetic pattern ofthetaxa analyzed [18]. Branches corresponding to partitions repro- 
duced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the 
associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. 
The tree with the highest log likelihood (-3411.7124) is shown. The percentage of trees in which the as- 
sociated taxa clustered together is shown next to the branches. A discrete Gamma distribution was used 
to model evolutionary rate differences among sites (G, parameter = 0.3439). A total of 1 ,284 nt positions 
were present in the final dataset. Model test and Maximum Likelihood inference were conducted in 
MEGA5 [1 9]. In bold E. meliloti AK58 and B021 CC strains. 
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Figure 2. Cell morphology and cell size analysis of E. meliloti strains. Cell size analysis with Pixcavator IA 5.1.0.0 
software [20] of logarithmically grown cultures (OD 600 =0.6) in TY medium of AK58, B021CC, plus other completely 
sequenced E. meliloti strains is reported. Cell size is expressed as cell area in um 2 , while roundness is the ratio be- 
tween the two main axes of the cell. Standard errors after more than 300 individual observations are reported. Dif- 
ferent letters indicate significant differences (P<0.05) after 1-way ANOVA. 



Table 2. Genome sequencing project information 


MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


High-Quality Draft 


MIGS-28 


Libraries used 


Two genomic libraries: one 454 PE library (9 kb insert size), 
one lllumina library 


MIGS-29 


Sequencing platforms 


lllumina GAM, 454 GS FLX Titanium 


MIGS-31. 2 


Sequencing coverage 


60 x (AK58) 910 x (B021CC) lllumina; 8.8 x pyrosequence 


MIGS-30 


Assemblers 


Newbler version 2.3, Velvet version 1.0.13, phrap version, 
1.080812, Allpaths version 39750, 


MIGS-32 


Gene calling method 


Prodigal 




GenBank Date of Release 


Pending 




GOLD ID 


B021CC: Gi07569 
AK58: Gi07577 




NCBI project ID 


B021CC: 375171 
AK58: 928722 




Database: IMG 


B021CC: 9144 
AK58: 732 7 


MIGS-13 


Source material identifier 


B021CC: DSM23809 
AK58: DSM23808 




Project relevance 


CSP2010, biotechnological, biodiversity 
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Genome annotation 

Genes were identified using Prodigal [33] as part 
of the Oak Ridge National Laboratory genome an- 
notation pipeline, followed by a round of manual 
curation using the JGI GenePRIMP pipeline [34]. 
The predicted CDSs were translated and used to 
search the National Center for Biotechnology In- 
formation (NCBI) non-redundant database, 
UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and 
InterPro databases. Additional gene prediction 
analysis and functional annotation was performed 
within the Integrated Microbial Genomes - Expert 
Review (IMG-ER) platform [16]. 



for AK58 representing overall 6,985,065 and 
6,974,333 bp, respectively. The overall G+C con- 
tent was 62.12% and 62.04% for B021CC and 
AK58, respectively (Table 3a and Table 3b). Of the 
6,746 and 6,992 genes predicted, 5,357 and 5,549 
were protein-coding genes, and 105 and 79 RNAs 
were present in B021CC and AK58, respectively. 
The large majority of the protein-coding genes 
(79.32% and 78.03%, B021CC and AK58, respec- 
tively) were assigned a putative function as COGs. 
The distribution of genes into COGs functional cat- 
egories is presented in Table 4. 



Genome properties 

The High-Quality draft assemblies of the genomes 
consist of 41 scaffolds for B021CC and 9 scaffolds 



Table 3a. Genome Statistics for strain B021CC 



Attribute 



Value % of Total 



Genome size (bp) 




6,985,065 


100.00% 


DNA coding region (bp) 




6,011,953 


86.07% 


DNA G+C content (bp) 




4,339,356 


62.12% 


Number of scaffolds 




41 




Total genes 




6,746 


1 00.00% 


RNA genes 




105 


1 .72% 


rRNA operons 




3 




tRNA genes 




58 


0.86% 


Protein-coding genes 








Genes with function prediction 


(proteins) 


5^357 


79.41% 


Genes in paralog clusters 




3,275 


48.55% 


Genes assigned to COGs 




5,351 


79.32% 


Genes assigned Pfam domains 




5,318 


78.83% 


Genes with signal peptides 




1,427 


21 .1 5% 


Genes with transmembrane hel 


ices 


1,521 


22.55% 



Table 3b. Genome statistics for strain AK58 



Attribute 


Value 


%age 


Genome size (bp) 


6,974,333 


100.00% 


DNA coding region (bp) 


5,914,246 


84.80% 


DNA G+C content (bp) 


4,315,694 


62.04% 


Number of scaffolds 


9 




Total genes 


6,992 


1 00.00% 


RNA genes 


79 


1.13% 


rRNA operons 


1* 




tRNA genes 


49 


0.70% 


Protein-coding genes 


6,934 


98.87% 


Genes with function prediction (proteins) 


5,459 


77.84% 


Genes in paralog clusters 


2,912 


41 .52% 


Genes assigned to COGs 


5,472 


78.03% 


Genes assigned Pfam domains 


5,420 


77.29% 


Genes with signal peptides 


1,432 


20.42% % 


Genes with transmembrane helices 


1,465 


20.89% 



*only one rRNA operon appears to be complete. 
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Table 4. Number of j 


^enes associated with the general COG functional categories 




B021CC 


AK58 






Code 


Value 


%aee 

o 


Value 


% as e 


Description 


E 


637 


10.69 


685 


1 1 .20 


Amino arid transnort and metabolism 

/ \ 1 1 J J 1 1 V J CX V_. 1 V \ L 1 CI 1 ID \J V..' 1 L CI 1 1 v 1 1 J 1 V_. 1 C V V J \—f 1 J J 1 1 1 


G 


604 


10.14 


596 


9.75 


Carbohydrate transport and metabolism 


D 


45 


0.76 


53 


0.87 


Cell cycle control, cell division, chromosome partitioning 


N 


69 


1.16 


68 


1.11 


Cell motility 


M 


305 


5.12 


298 


4.87 


Cell wall/membrane biogenesis 


B 


1 


0.02 


3 


0.05 


Chromatin structure and dynamics 

V. .. 1 1 1 V } 1 JICLLIII JLI Uv>lU 1 L CI 1 1 \_J VI V 1 [ C 1 1 1 J 1 V_ J 


H 


202 


3.39 


205 


3.35 


Cnenzvme transport and metaholKm 


V 


64 


1.17 


62 


1 .01 


Defense mechanisms 


c 


365 


6.1 3 


356 


5.82 


Energy production and conversion 


w 


1 


0.02 


1 


0.02 


Extracellular structures 


s 


608 


10.20 


617 


10.09 


Function unknown 


R 


730 


12.25 


767 


12.54 


General function nrediction onlv 

V_ J 1 1 v_> 1 CX 1 1 L_J 1 1 \~< C 1 U J i IJ 1 V - I \_h- LI 1 1 \_/ Illy 


p 


320 


5.1 7 


294 


4.81 


Inoropnir inn tr^rKnort pnrl mptphnli^m 

I I ^ ti I II L iui I u al ljuui L ci I I vj iiivzrt-CiuwiiDiii 


u 


104 


1.75 


102 


1 .67 


1 ntracel 1 ular traff icki ng and secretion, and vesicular transport 


1 


210 


3.52 


217 


3.55 


Lipid transport and metabolism 


F 


107 


1.80 


114 


1 .86 


Nucleotide transport and metabolism 


o 


185 


3.10 


189 


3. 09 


Posttranslational modification, protein turnover, chaperones 


L 


2 73 


4.58 


32 7 


5.35 


Replication, recombination and repair 


n 


163 


2.74 


1 




'iprnnmi'v/ mptannl itf*c r\ inc\/ntnf*ci c tra ncnnrt ann ratahn icm 
OtrHJI lUdl y 1 lltrLdUUl I LCD UIUdv 1 III ICdId, Lldl i ctl IU LdLdUUI Ibl 11 


T 


247 


4.14 


249 


4.07 


Signal transduction mechanisms 


K 


524 


8.79 


551 


9.01 


Transcription 


J 


195 


3.2 7 


201 


3.29 


Translation, ribosomal structure and biogenesis 




1395 


20.68 


1541 


21.97 


Not in COGs 
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